Introduction

Doctor Who is a British science fiction television programme broadcast by BBC. The programme depicts the adventures of a Time Lord called the Doctor, an extraterrestrial being who appears to be human. The Doctor explores the universe in a time-travelling space ship called the TARDIS.

Datatable

Here is the main dataset I will be working with today.

Adding the Doctors

Unfortunately, there is no variable for the actor who played the Doctor in each episode. This actor changes periodically and it would be interesting to see how facets of the show change with this. Therefore, I added this as another column.

episodes <- episodes %>% 
  mutate(doctor = case_when(
    season_number == "1" ~ "Christopher Eccleston", 
    season_number == "2" ~ "David Tennant",
    season_number == "3" ~ "David Tennant",
    season_number == "4" ~ "David Tennant",
    first_aired == "2008-12-25" ~ "David Tennant",
    first_aired == "2009-4-11" ~ "David Tennant",
    first_aired == "2009-11-15" ~ "David Tennant",
    first_aired == "2009-12-25" ~ "David Tennant",
    first_aired == "2010-01-01" ~ "David Tennant",
    season_number == "5" ~ "Matt Smith",
    season_number == "6" ~ "Matt Smith",
    season_number == "7" ~ "Matt Smith",
    first_aired == "2013-12-25" ~ "Matt Smith",
    season_number == "8" ~ "Peter Capaldi",
    season_number == "9" ~ "Peter Capaldi",
    season_number == "10" ~ "Peter Capaldi",
    season_number == "11" ~ "Jodie Whittaker", 
    season_number == "12" ~ "Jodie Whittaker", 
    season_number == "13" ~ "Jodie Whittaker", 
    episode_title == "The Day of the Doctor" ~ "Matt, David, and John" # Crossover episode 
  ))

Describing the data

Doctor Who has run for many years in its classic seasons (1963 - 1989) and it’s revived (2005 - present). So I wanted to see what seasons were included.

episodes <- episodes %>% 
  mutate(year = year(first_aired), 
         month = month(first_aired), 
         day = day(first_aired))

range(episodes$year) # 2005 - 2021 
## [1] 2005 2021
unique(episodes$era) # (doesn't include classic 1963 - 1989 era)
## [1] "revived"

The data here only includes episodes from 2005 onward. Despite there being a variable that codes for era (classic or revived), no data from the classic run was included in these data.

Instead, the data includes the the most recent 13 seasons, including 172 episodes, 19 of which were specials.

season_no. <- episodes %>%
  select(season_number) %>% 
  na.omit()

range(season_no.$season_number) # 13 seasons, 172 episodes
## [1]  1 13
episodes %>% 
  count(type)
## # A tibble: 2 x 2
##   type        n
##   <chr>   <int>
## 1 episode   153
## 2 special    19

The shortest episode and length (minutes) was 41 minutes long, as most episodes run for 45 minutes, +/-5.

episodes %>% 
  slice_min(duration) %>% 
  pull(episode_title, duration)
##                   41 
## "The Power of Three"
ep_short <- episodes %>% 
  select(episode_title, type, duration) %>% 
  arrange(duration)
head(ep_short) # 45 +/- 5
## # A tibble: 6 x 3
##   episode_title       type    duration
##   <chr>               <chr>      <dbl>
## 1 The Power of Three  episode       41
## 2 World War Three     episode       42
## 3 Before the Flood    episode       42
## 4 The Eaters of Light episode       42
## 5 Flatline            episode       43
## 6 Under the Lake      episode       43

However, the longest episodes were much longer than this 45+/-5. The longest episode was The Day of the Doctor, which included 3 separate Doctors and ran for 77 minutes. Most other episodes that ran for similar times were specials.

episodes %>% 
  slice_max(duration) %>% 
  pull(episode_title, duration) # crossover episode 
##                      77 
## "The Day of the Doctor"
ep_long <- episodes %>% 
  select(episode_title, type, duration) %>% 
  arrange(-duration)
head(ep_long) # deep breath - PC's 1st episode 
## # A tibble: 6 x 3
##   episode_title              type    duration
##   <chr>                      <chr>      <dbl>
## 1 The Day of the Doctor      special       77
## 2 Deep Breath                episode       76
## 3 The End of Time – Part Two special       75
## 4 Voyage of the Damned       special       72
## 5 Revolution of the Daleks   special       71
## 6 The Eleventh Hour          episode       65

Ratings over time

Next I wanted to look at the episodes’ ratings over time.

view_rating <- episodes %>% 
  select(first_aired, rating, uk_viewers, episode_title, doctor) %>% 
  na.omit()

g1 <- ggplot(view_rating, aes(x = first_aired, y = rating, label =  episode_title)) +
  geom_line(colour = '#003b6f') +
  labs(y = 'Episode Rating', 
       x = "Air Date", 
       title = "Doctor Who Episode Ratings Over Time")
#g1

ggplotly(g1)
200520102015202076808488
Doctor Who Episode Ratings Over TimeAir DateEpisode Rating

You can see that following “The Day of the Doctor”, ratings for Doctor Who have never reached as high. After its follow-up episode, the Doctor became Peter Capaldi and fans were divided.

Views over time

Next I wanted to see if these poorer ratings were reflected in the show’s views.

g2 <- ggplot(view_rating, aes(x = first_aired, y = uk_viewers, label = episode_title)) +
  geom_line(colour = '#003b6f') +
  labs(y = 'UK Viewers', 
       x = "Air Date", 
       title = "Doctor Who Episode Views Over Time")
#g2

ggplotly(g2)
2005201020152020681012
Doctor Who Episode Views Over TimeAir DateUK Viewers

Viewership for Doctor Who over time seems to be cyclical, likely fluctuating with season-linked patterns. Views after the turning point episodes for ratings dropped and only reached its revious viewership for the episode debuting Jodie Whittaker, the first Doctor to be played by a woman. However, views fell after this.

This plot shows the rating and view trends side-by-side.

g4 <- ggarrange(g1, g2)
g4

# ggplotly(g4) # cannot do with plotly

Impacts of doctor actors on ratings

It is clear that the Doctor of the time has an influence on who watches. So let’s look at the distributions of ratings by doctor.

doctors_rating <- episodes %>%
  select(rating, doctor, episode_title) %>%
  filter(!episode_title == "The Day of the Doctor")

doctor_order <- c("Jodie Whittaker", "Peter Capaldi", "Matt Smith", "David Tennant", "Christopher Eccleston")

g5 <- ggplot(doctors_rating, aes(x = factor(doctor, level = doctor_order), y = rating, colour = doctor, label = episode_title)) +
  geom_jitter() +
  coord_flip() +
  theme(legend.position = "none") +
  labs(x = "Doctor Actor", 
       y = "Episode Rating", 
       title = "Episode Ratings By Doctor")
#g5

ggplotly(g5)
75808590Jodie WhittakerPeter CapaldiMatt SmithDavid TennantChristopher Eccleston
Episode Ratings By DoctorEpisode RatingDoctor Actor

C.E. Had a variety of ratings, D.T. (except for 1 episode) and M.S. were consistently rated highly, then ratings dropped for P.C. and J.W. The ratings seem to cluster nicely with the doctor.

Impacts of writers on ratings

I then wanted to see if the writer behind the episodes also influenced the ratings.

writers_episodes <- left_join(episodes, writers)
## Joining, by = "story_number"
writer_length <- writers_episodes %>% 
  group_by(writer) %>% 
  mutate(eps_written = n()) %>% 
  ungroup() %>% 
  filter(eps_written > 3)

g6 <- ggplot(writer_length, aes(x = reorder(writer, eps_written), y = rating, colour = writer, label = episode_title, label2 = season_number)) +
  geom_jitter() +
  coord_flip() +
  theme(legend.position = "none") +
  labs(x = "Writer", 
       y = "Epsode Rating", 
       title = "Episode Ratings by Writer")
#g6

ggplotly(g6)
7680848892Helen RaynorJamie MathiesonPeter HarnessGareth RobertsToby WhithouseMark GatissChris ChibnallRussell T DaviesSteven Moffat
Episode Ratings by WriterEpsode RatingWriter

Here it seems that the more episodes you write, the more varied your ratings are, excluding Steven Moffat who had fairly consistent ratings.

The rewards of good writing

I then wanted to see if the people responsible for good or bad ratings were subsequently involved in more or less episodes. Although the length of when a doctor stays is consistent regardless of ratings, I wanted to see if there was a relationship between the average rating of the writers’ episodes, and how many episodes they were invited to write.

writer_length_1 <- writers_episodes %>% 
  select(writer, rating) %>% 
  group_by(writer) %>% 
  mutate(eps_written = n()) %>% 
  na.omit() %>% 
  mutate(mean_rating = mean(rating)) %>% 
  select(writer, eps_written, mean_rating) %>% 
  ungroup() %>% 
  distinct() %>% 
  #filter(eps_written > 3) %>% 
  arrange(mean_rating)

g7 <- ggplot(writer_length_1, aes(x = eps_written, y = mean_rating, colour = writer)) +
  geom_point() +
  theme(legend.position = 'none') +
  labs(y = 'Average Episode Rating', 
       x = "No. of Episodes Written", 
       title = "Relationship Between Episodes Written and Average Rating")
#g7

ggplotly(g7)
0102030405080.082.585.0
Relationship Between Episodes Written and Average RatingNo. of Episodes WrittenAverage Episode Rating

I then decided to visualise this in a different and more fun way.

writer_length_2 <- writer_length_1 %>% 
  filter(eps_written > 3)

g8 <- ggplot(data = writer_length_2) +
  geom_link(aes(x = mean_rating, xend = mean_rating, y = 0, yend = eps_written, colour = writer), size = 1.65, alpha = 0.6) +
  #geom_point(aes(y=eps_written, x=mean_rating, color=writer), size=2) +
  geom_image(mapping = aes(x = mean_rating, y = eps_written, image = '\\Users\\hayle\\OneDrive\\Attachments\\Hayley\\Tidy Tuesday\\tardis-removebg-preview.png'), size = 0.05) +
  coord_polar(theta = "y", clip="off", start = 0) +
  scale_x_continuous(limits = c(79, 90)) +
  scale_y_continuous(limits = c(NA, 60)) +
  theme_void() 
  #annotate(geom="text", label = "", x=75, y=50, vjust=1.5, color= "purple", size=0)
g8

Bloopers

Here is an outtake of an ugly, tardis-filled graph.

g3 <- ggplot(view_rating, aes(x = first_aired, y = rating, label = episode_title)) +
  geom_line(colour = '#003b6f') +
  geom_image(mapping = aes(x = first_aired, y = rating, image = '\\Users\\hayle\\OneDrive\\Attachments\\Hayley\\Tidy Tuesday\\tardis-removebg-preview.png'))

g3